1 Executive Summary

This document details the steps performed to arrive at the final recommendation. Based on the Break Even Period (BEP, number of months required to recover cost of property), we recommend investing in the Zipcodes 11215, 11231, 11217 & 10025 . These results apply to short term stay properties with 2 bedrooms in New York City. Initially, we performed Data Quality Check and identified important factors for assessing investment potential, followed by Exploratory data analysis to study the trends. Eventually, we defined customized Business Metrics and analyzed the summarized BEP to come up with our recommendations. The code written is highly scalable and is built for reusability.

2 Problem Statement

You are consulting for a real estate company that has a niche in purchasing properties to rent out short-term as part of their business model specifically within New York City. The real estate company has already concluded that two bedroom properties are the most profitable; however, they do not know which zip codes are the best to invest in.

The real estate company has engaged your firm to build out a data product and provide your conclusions to help them understand which zip codes would generate the most profit on short term rentals within New York City. You will be looking at publicly available data from Zillow and AirBnB:

After meeting with the strategy team, you’ve got an idea of where to start, key concerns, and how you can help this real estate company with the market data while keeping the following assumptions in mind:

3 R Environment Setup

The following code chunk will install and load required R packages. The following packages have been used throughout the analysis for various purposes -

pkg.instl <- function(x) {
  if (!requireNamespace(x, quietly = TRUE))
    install.packages(x)
}

4 Data Processing

Based on my experience, DQ checks and Data Processing are highly iterative processes where the former feeds the latter; however, to have a clear objective in mind about the final outcome is the starting step. This report not only documents my code and processes, but it is also a sequential narration of my logical thinking

4.1 Import Data

First things first, lets import the data. The following code chunk contains functions for easing data loading

imp <- function(x) {
        suppressMessages(df <- as.data.frame(read_csv(x)))
        cat(dim(df))
        return(df)
}

head.cust <- function(x) {
  kable(head(x))  %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
}
  • For Airbnb - Revenue Data,
listing <- imp("listings.csv")
## 40753 95
head.cust(listing)
id listing_url scrape_id last_scraped name summary space description experiences_offered neighborhood_overview notes transit access interaction house_rules thumbnail_url medium_url picture_url xl_picture_url host_id host_url host_name host_since host_location host_about host_response_time host_response_rate host_acceptance_rate host_is_superhost host_thumbnail_url host_picture_url host_neighbourhood host_listings_count host_total_listings_count host_verifications host_has_profile_pic host_identity_verified street neighbourhood neighbourhood_cleansed neighbourhood_group_cleansed city state zipcode market smart_location country_code country latitude longitude is_location_exact property_type room_type accommodates bathrooms bedrooms beds bed_type amenities square_feet price weekly_price monthly_price security_deposit cleaning_fee guests_included extra_people minimum_nights maximum_nights calendar_updated has_availability availability_30 availability_60 availability_90 availability_365 calendar_last_scraped number_of_reviews first_review last_review review_scores_rating review_scores_accuracy review_scores_cleanliness review_scores_checkin review_scores_communication review_scores_location review_scores_value requires_license license jurisdiction_names instant_bookable cancellation_policy require_guest_profile_picture require_guest_phone_verification calculated_host_listings_count reviews_per_month
7949480 https://www.airbnb.com/rooms/7949480 2.01705e+13 2017-05-03 City Island Sanctuary relaxing BR & Bath w Parking Come relax on City Island in our quiet guest room w/adjacent private bath. Enjoy a Continental Breakfast served by Didier at your table downstairs. Walk to restaurants, shops, the park or the beach. Park one car on our private driveway. If you are looking to explore Manhattan you can take the 29 bus to the nearest subway number 6. Or you can take the BXM8 express bus at the Pelham Bay Subway station which goes directly into Manhattan. There are also green taxis at the subway. On parle français et anglais, (lire Français ci-dessous). I am a native New Yorker and Didier grew up in Biarritz, France. Our three friendly cats do not go into the room. The bathroom is yours during your entire stay (we have our own en suite). We will provide a Continental breakfast with fresh bread butter & jams, fresh squeezed orange juice or fresh fruit, tea or coffee served by my French husband who is a talented cook! You might even get a special treat of canelés de Bordeaux, banana bread or home made scones! City Island Sanctuary relaxing BR & Bath w Parking Pièce calme, ensoleillée et à proximité d’une salle de bain privée. Marchez jusqu, au parc ou à la plage. On peut aller à pied à quelques restaurants, une petite supermarché, petits magasins et le City Island Nautical Museum. Prenez le metro ou le bus express à Manhattan (une heure et demi à centre-ville. Les chats sont en résidence, mais ne vont pas dans la chambre. Cable TV avec TV5 Monde, lecteur DVD, Wi-fi . Petit Come relax on City Island in our quiet guest room w/adjacent private bath. Enjoy a Continental Breakfast served by Didier at your table downstairs. Walk to restaurants, shops, the park or the beach. Park one car on our private driveway. If you are looking to explore Manhattan you can take the 29 bus to the nearest subway number 6. Or you can take the BXM8 express bus at the Pelham Bay Subway station which goes directly into Manhattan. There are also green taxis at the subway. On parle français et anglais, (lire Français ci-dessous). I am a native New Yorker and Didier grew up in Biarritz, France. Our three friendly cats do not go into the room. The bathroom is yours during your entire stay (we have our own en suite). We will provide a Continental breakfast with fresh bread butter & jams, fresh squeezed orange juice or fresh fruit, tea or coffee served by my French husband who is a talented cook! You might even get a special treat of canelés de Bordeaux, banana bread or home mad none City Island is a unique sanctuary in New York City’s the Bronx. It boasts many wonderful seafood restaurants, two great pubs and the only true French bistro in the borough. We are walking distance to the largest park in the City of New York, Pelham Bay Park and also to lovely Orchard Beach, the Riviera of the Bronx! We are walkable to Pelham Park which is the largest park in NYC. It has picnic areas, tennis courts, horseback riding, Orchard Beach and a lovely rocky coast. Check out their website for all there is to do! On the island, you can walk to everything. A local bus will take you to shopping at the Mall at Bay Plaza near Coop City. If you want to go to Manhattan you take the local 29 bus to the subway No. 6 to Manhattan. The local bus does not run after midnight but there are green taxis at the subway station. There is an express bus early in the morning which goes directly from the island to Manhattan. It will take at least one hour to get to midtown by subway and a bit longer on the express bus depending upon traffic. Feel free to enjoy our large backyard in season, our downstairs large recreation area with TV, WIFI. Walk to the local beaches for the view or a dip. Please, no food or drinks except water in the guest bedroom. We speak fluent French and English. We will be in residence during your stay to help you with recommendations for a real New York experience or a unique City Island French experience. But, we do respect your privacy! No extra guests or visitors who are not in the reservation. Maximum 2 registered guests only. No late check ins or check outs. Please, no food or drinks in the guest bedroom except water. No smoking on the property. It’s a residential neighborhood so we keep it quiet after 10. Please only inquire about booking for yourself, we will not accept 3rd party inquiries. https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=small https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=large https://a0.muscache.com/im/pictures/001d1cb0-6ef2-4074-bd39-e486ee58d00f.jpg?aki_policy=x_large 119445 https://www.airbnb.com/users/show/119445 Linda & Didier 2010-05-06 New York, New York, United States I am a designer, former instructor at a design school and my husband is a talented French cook (he is from France). He is also an artist, writer and a Zen practitioner. We are sociable but respectful of your privacy.
I am a native New Yorker having relocated back to NYC from So Cal because I Love New York! We are the only French/American Airbnb in the Bronx!
within an hour 100% N/A TRUE https://a0.muscache.com/im/pictures/2de30fe4-bebe-4145-a86a-b8a8a970078d.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/2de30fe4-bebe-4145-a86a-b8a8a970078d.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘jumio’] TRUE TRUE City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85205 -73.78868 TRUE House Private room 2 1 1 1 Real Bed {“Cable TV”,“Wireless Internet”,“Air conditioning”,“Free parking on premises”,Breakfast,“Pets live on this property”,Cat(s),“Indoor fireplace”,Heating,“Smoke detector”,“Carbon monoxide detector”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron,“translation missing: en.hosting_amenity_49”,“translation missing: en.hosting_amenity_50”} NA $99.00 NA NA $100.00 NA 1 $20.00 1 7 yesterday NA 24 54 80 170 2017-05-03 25 2016-01-18 2017-04-23 100 10 10 10 10 10 10 FALSE NA NA FALSE moderate TRUE TRUE 1 1.59
16042478 https://www.airbnb.com/rooms/16042478 2.01705e+13 2017-05-04 WATERFRONT STUDIO APARTMENT My place is close to Sea Shore. You’ll love my place because of the coziness, the location, the views, and the people. My place is good for couples, solo adventurers, business travelers, and families (with kids). (URL HIDDEN) My place is close to Sea Shore. You’ll love my place because of the coziness, the location, the views, and the people. My place is good for couples, solo adventurers, business travelers, and families (with kids). (URL HIDDEN) public transport from Grand Central, NYC or from JFK, Laguadia is about 30 minutes away. (URL HIDDEN) Fine sea food restaurant, bars and night clubs all within walking distance, Sailing, fishing, canoeing, marinas and just 30 minutes away from central NYC or major airports like JFK or LaGuardia airports. Enjoy NYC none (URL HIDDEN) Fine sea food restaurant, bars and night clubs all within walking distance, Sailing, fishing, canoeing, marinas and just 30 minutes away from central NYC or major airports like JFK or LaGuardia airports. Enjoy NYC NA public transport from Grand Central, NYC or from JFK, Laguadia is about 30 minutes away. NA NA https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=small https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=large https://a0.muscache.com/im/pictures/bb5bc3c4-3eb2-4b90-a2c8-28401c6eabcc.jpg?aki_policy=x_large 9117975 https://www.airbnb.com/users/show/9117975 Collins 2013-09-29 New York, New York, United States I am married with 3 children and 2 grandkid. Family and community is the most important thing besides God. I am a Tax consultant and a I have travelled to a few countries including Germany, France, Holland, Netherlands, England in Europe. I have also travel extensively within the Caribbean and Mexico so I appreciate the importance of comfortable and affordable lodging when one is away. a few days or more 0% N/A FALSE https://a0.muscache.com/im/users/9117975/profile_pic/1423794446/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/9117975/profile_pic/1423794446/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘phone’, ‘facebook’] TRUE FALSE City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85349 -73.78861 TRUE Apartment Private room 4 1 1 1 Real Bed {TV,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,“Fire extinguisher”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron} NA $200.00 NA NA NA NA 1 $0.00 7 29 6 months ago NA 30 60 90 180 2017-05-04 0 NA NA NA NA NA NA NA NA NA FALSE NA NA TRUE flexible FALSE FALSE 1 NA
1886820 https://www.airbnb.com/rooms/1886820 2.01705e+13 2017-05-04 Quaint City Island Community. Quiet island boating town on Long Island Sound. 10 miles from Manhattan. Public trans. Walk to shops, restaurants, bars,parks. Near Botanical gardens, Bronx Zoo and Fordham University. Water view. off street parking. 3BR, 3 Full Baths. Washer/dryer. full kitchen.Fully furnished comfortable home with deck/backyard. local caretaker. Master bed with queen bed, full bath and office. Mid bedroom has queen bed. Small bed has single. TV room/den queen sleeper sofa. Galley kitchen off dining room. Piano. Winter water view. Large back deck with seating. Small front porch. Great for family of 4. Will fit 7. Additional$25.00 per person per night over 4 people. Quiet island boating town on Long Island Sound. 10 miles from Manhattan. Public trans. Walk to shops, restaurants, bars,parks. Near Botanical gardens, Bronx Zoo and Fordham University. Water view. off street parking. 3BR, 3 Full Baths. Washer/dryer. full kitchen.Fully furnished comfortable home with deck/backyard. local caretaker. Master bed with queen bed, full bath and office. Mid bedroom has queen bed. Small bed has single. TV room/den queen sleeper sofa. Galley kitchen off dining room. Piano. Winter water view. Large back deck with seating. Small front porch. Great for family of 4. Will fit 7. Additional$25.00 per person per night over 4 people. 2 story 1800 sq ft house all to yourselves. Local caretaker available during stay. We will be available by phone Small New England type town in the middle of the big city. Small Island in the Long Island Sound. Local marinas, seafood restaurants, yacht clubs, supermarket, library, shops. Parking available in driveway. Bus stop none Small New England type town in the middle of the big city. Small Island in the Long Island Sound. Local marinas, seafood restaurants, yacht clubs, supermarket, library, shops. NA Parking available in driveway. Bus stop around the corner. Near Orchard Beach, Bronx Zoo, World Famous Botanical Gardens, Fordham University and of course, Manhattan 2 story 1800 sq ft house all to yourselves. Local caretaker available during stay. We will be available by phone No smoking. No pets. This is our private home. We expect that you would treat it with care and enjoy it as we do. NA NA https://a0.muscache.com/im/pictures/26266928/14c688d9_original.jpg?aki_policy=large NA 9815788 https://www.airbnb.com/users/show/9815788 Steve 2013-11-04 US NA N/A N/A N/A FALSE https://a0.muscache.com/im/users/9815788/profile_pic/1383845917/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/9815788/profile_pic/1383845917/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’] TRUE FALSE City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.84114 -73.78305 TRUE House Entire home/apt 4 3 3 3 Real Bed {TV,“Cable TV”,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,Heating,Washer,Dryer} NA $300.00 NA NA $800.00 $100.00 4 $25.00 7 90 11 months ago NA 30 60 90 365 2017-05-04 0 NA NA NA NA NA NA NA NA NA FALSE NA NA FALSE strict FALSE FALSE 1 NA
6627449 https://www.airbnb.com/rooms/6627449 2.01705e+13 2017-05-05 Large 1 BDRM in Great location This ground floor apartment is light and airy with a fully equipped kitchen. Located in the heart of City Island w/ shops, restaurants and transportation nearby. If needed there is extra sleeping space on the modular sofa. We are close to fishing, boating, biking, horseback riding or hiking. We are near Orchard Beach and Pelham Bay Park. This is a very convenient area. This ground floor apartment is light and airy with a fully equipped kitchen. Located in the heart of City Island w/ shops, restaurants and transportation nearby. If needed there is extra sleeping space on the modular sofa. We are close to fishing, boating, biking, horseback riding or hiking. We are near Orchard Beach and Pelham Bay Park. This is a very convenient area. The entire apartment is available as I am only there part of the time. There is a patio in back off the living room. You are welcome to help yourself to coffee or tea. My daughter and her family live on the Island and will be there to greet you and help you out with any information you might like. City Island is a unique and a hidden gem of New York City. Many New Yorkers do not even know it exists. It is known for its seafood and sailing, with many seafood restaurants in various price ranges. We are close to grocery stores, library, City Island Diner, antique stores and art galleries. It is a very safe and walk none City Island is a unique and a hidden gem of New York City. Many New Yorkers do not even know it exists. It is known for its seafood and sailing, with many seafood restaurants in various price ranges. We are close to grocery stores, library, City Island Diner, antique stores and art galleries. It is a very safe and walkable neighborhood. City Island is only 11/2 miles long and (3) blocks wide at its widest, with 2 seafood restaurants at the end with outdoor do it yourself seating There is off street parking in the apartment complex. I have a reserved space which you would be able to use. We are (2) blocks from the BX29 bus which you can take to the Pelham Bay subway station. ( a 10 minute ride) From there you can transfer to the #6 train to Manhattan; about 45 minutes. The entire apartment is available as I am only there part of the time. There is a patio in back off the living room. You are welcome to help yourself to coffee or tea. My daughter and her family live on the Island and will be there to greet you and help you out with any information you might like. There is no smoking and no pets. Otherwise use the place as you would your own. https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=small https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=large https://a0.muscache.com/im/pictures/83539171/13fe7e85_original.jpg?aki_policy=x_large 13886510 https://www.airbnb.com/users/show/13886510 Arlene 2014-04-04 Los Angeles, California, United States NA within a few hours 100% N/A FALSE https://a0.muscache.com/im/users/13886510/profile_pic/1433129533/original.jpg?aki_policy=profile_small https://a0.muscache.com/im/users/13886510/profile_pic/1433129533/original.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘kba’] TRUE TRUE City Island, City Island, NY 10464, United States City Island City Island Bronx City Island NY 10464 New York City Island, NY US United States 40.84977 -73.78661 TRUE Apartment Entire home/apt 3 1 1 1 Real Bed {TV,Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,“Buzzer/wireless intercom”,Heating,“Family/kid friendly”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,Essentials,Shampoo,Hangers,“Hair dryer”,Iron} NA $125.00 $775.00 NA NA $75.00 1 $0.00 3 21 2 weeks ago NA 8 30 60 335 2017-05-05 12 2015-07-04 2016-10-24 93 10 10 10 10 10 10 FALSE NA NA FALSE strict FALSE FALSE 1 0.54
5557381 https://www.airbnb.com/rooms/5557381 2.01705e+13 2017-05-04 Quaint City Island Home Located in an old sea-shanty town, our home has a countryside feel, only an hour away from Manhattan’s excitement! With a bed big enough for two it’s ideal for business-(wo)men or out-of-towners who don’t want the hustle/bustle to follow them home. You won’t find a place so close to the city (NYC of course) with such a big back yard! There’s lots to do. We are only a train ride away from Manhattan in all it’s glory - we live a bus ride or a 10 min car ride from the Pelham Bay station on the 6 line, to Grand Central it’s less than an hour and a half by public transport, less than an hour if you drive to the train - and there’s plenty to do on City Island as well (or ‘the island’ as we locals call it). There are oodles of seafood restaurants withing walking distance (including a wonderful diner just a block away that has the best breakfast), swimming in Eastchester Bay during the summer, bike trails, Pelham Bay Park and Orchard beach! If you’re looking for some combination of the NYC life with outdoor adventure, this is the place for you! DISCLAIMER: If you are looking for a place from which to explore Manhattans night life, this is probably not the place for you. The public bus that runs from the end of the 6 train onto City Isl Located in an old sea-shanty town, our home has a countryside feel, only an hour away from Manhattan’s excitement! With a bed big enough for two it’s ideal for business-(wo)men or out-of-towners who don’t want the hustle/bustle to follow them home. You won’t find a place so close to the city (NYC of course) with such a big back yard! There’s lots to do. We are only a train ride away from Manhattan in all it’s glory - we live a bus ride or a 10 min car ride from the Pelham Bay station on the 6 line, to Grand Central it’s less than an hour and a half by public transport, less than an hour if you drive to the train - and there’s plenty to do on City Island as well (or ‘the island’ as we locals call it). There are oodles of seafood restaurants withing walking distance (including a wonderful diner just a block away that has the best breakfast), swimming in Eastchester Bay during the summer, bike trails, Pelham Bay Park and Orchard beach! If you’re looking for some combination of the NYC lif none City Island is unique in two ways. First, you get the small community feel, juxtaposed with NEW YORK CITY. Weird right? A small community, surrounded by water and parkland, so close to such a city! The second reason I love my neighborhood is it’s history. City Island used to be a oyster/fishing village (there are still places to rent boats and fishing equipment for those interested), and many inhabitants were boat builders ((website hidden)). City Island even has it’s own Nautical Museum, if you need a weekend afternoon activity. I am going to put this in a few places (also above in the overall description) as this has been a little bit of an issue for past guests: If you are looking for a place from which to explore Manhattans night life, this is probably not the place for you. The public bus that runs from the end of the 6 train onto City Island stops running a little before midnight. However, if you only want to do day trips, explore the Bronx (or Queens), and come back to a quiet house with no traffic sounds, this is the perfect location. The easiest way to get to Manhattan quickly is to drive to the Hunts Point station on the 6 train (this gets you to midtown in a little under an hour). If driving isn’t your thing, there is a public bus (the Bx29) that runs from City Island to the Pelham Bay station (the last station on the 6 line in the Bronx), this option will take you a little less than an hour and a half to get to midtown manhattan. Fare is now $2.75 per ride, and comes with a free transfer. Parking is available on the street in front of the property (we do have a driveway in a pinch, but it’s a little inconvenient). The house is just a place to sleep (and cook and shower). We have a great big back yard (by city standards) with a picnic table for outdoor diners. There is also a garage space with weights and some boxing equipment for the exercise-inclined. I live in the back house (there are two houses on the property, my father lives in the front house) full time. The house is generally empty during weekdays, but the living room, kitchen and bathroom are shared spaces, so interaction will be inevitable. However, I’m pretty laid back. As long as you do your dishes! ;) Only three actual rules: 1. Don’t smoke in the house. There is a beautiful back yard for that. 2. Strip the bed before you check out and leave the sheets in a pile on the bed so I can just grab them and throw them in the wash. 3. Take out the trash on our way out as you leave. The garbage is located to the left of the stair as you’re exiting the front door. Other than that, I’m a very reasonable person. I’ll try to respect your space and clean up after myself while you’re here, and I expect you to do the same. Nobody gets everything exactly right the first time and 2nd chances make the world go round, but that doesn’t mean we shouldn’t do our best. I love pets. Furry, scaly, feathery, doesn’t matter. Pets are not only allowed, they’re encouraged! I don’t have a pet at the moment, but that very well may change! https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=small https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=large https://a0.muscache.com/im/pictures/85805287/7c6b38ae_original.jpg?aki_policy=x_large 28811542 https://www.airbnb.com/users/show/28811542 Phoebe 2015-03-05 New York, New York, United States

My name is Phoebe. Both my parents are biologists, and it looks like it’s contagious. My favorite mode of travel is my own two feet (sometimes with the help of a canoe). I have many passions, but two of them are learning about the natural world, and showing people how wonderful it is! I’ve lead canoe trips in northern Ontario, trail crews in Colorado and Virginia, and I’m always looking for the next adventure.

I’ve been a part of the Air BnB community since 2015, and I love meeting new people and exchanging stories. I keep my plate very full, so I do not cook for guests, and sometimes my house gets a little messy, but I make sure to keep common areas clear and clean for your use. I have refurbished the house mostly on my own (with help from friends and family for the heavy lifting of course), and I’m welcome to suggestions for updates!

(you’ll see french on the list of languages spoken, I speak only a little french because I went to school on Montrèal)
within an hour 100% N/A FALSE https://a0.muscache.com/im/pictures/43cb9adc-f198-47cb-ba44-8a737440e825.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/43cb9adc-f198-47cb-ba44-8a737440e825.jpg?aki_policy=profile_x_medium City Island 1 1 [‘email’, ‘phone’, ‘reviews’, ‘jumio’] TRUE TRUE City Island, Bronx, NY 10464, United States City Island City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.85002 -73.78933 TRUE House Private room 4 1 1 1 Real Bed {Internet,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,“Pets allowed”,Gym,Heating,“Family/kid friendly”,“Smoke detector”,“Carbon monoxide detector”,“First aid kit”,Essentials,“Laptop friendly workspace”,“translation missing: en.hosting_amenity_50”} NA $69.00 $350.00 $1,200.00 NA $17.00 2 $15.00 3 1125 yesterday NA 17 47 77 352 2017-05-04 86 2015-05-25 2017-04-22 97 10 10 10 10 10 10 FALSE NA NA TRUE moderate FALSE FALSE 1 3.63
9147025 https://www.airbnb.com/rooms/9147025 2.01705e+13 2017-05-04 Cozy City Island Cottage City Island is a small community at the edge of New York City just beyond Pelham Bay Park in the Bronx and surrounded by the waters of the Long Island Sound and Eastchester Bay. Enjoy the water views and seafood restaurants or nearby Orchard Beach. This cozy little cottage is an open floor plan with a queen size murphy bed. There is brand new bathroom, eat in kitchen, large fenced in yard, perfect for BBQ’s or sitting under the full grown apple tree. The living room has lots of windows and a partial view of the sound and Manhattan. There are two bikes available for use. The location is walking distance to many of the islands best eateries, bars, Nautical Museum, and historic City Island Cemetery . Beautiful Pelham Park is close by, where you can enjoy the boardwalk at Orchard Beach or hike and bike the many trails. Fishing charters also available. Enjoyable get away any time of the year. You will forget you are even in the Bronx, and be transported to New England. City Island is a hidden gem. City Island is a small community at the edge of New York City just beyond Pelham Bay Park in the Bronx and surrounded by the waters of the Long Island Sound and Eastchester Bay. Enjoy the water views and seafood restaurants or nearby Orchard Beach. This cozy little cottage is an open floor plan with a queen size murphy bed. There is brand new bathroom, eat in kitchen, large fenced in yard, perfect for BBQ’s or sitting under the full grown apple tree. The living room has lots of windows and a partial view of the sound and Manhattan. There are two bikes available for use. The location is walking distance to many of the islands best eateries, bars, Nautical Museum, and historic City Island Cemetery . Beautiful Pelham Park is close by, where you can enjoy the boardwalk at Orchard Beach or hike and bike the many trails. Fishing charters also available. Enjoyable get away any time of the year. You will forget you are even in the Bronx, and be transported to New England. City Island is a hi none City Island is famous for it’s seafood restaurants both large and small, casual and fancy. The Crab Shanty, Sammy’s, The Lobster Box, Portofino’s and the Seashore, which also has a large bar with live music on Fridays and boasts a happy hour with drink specials. There are also several other restaurants to choose from that have a variety of offerings. A City Island local favorite is The Black Whale, which also has a sunday bunch, Arties; italain food, Don Coqui, Puerto Rician food, Bistro SK, french, and Ohana with table side hibachi. End your night at Paddy’s On The Island, which is always filled with colorful locals and sometimes live music. Please note that it is a 15 minute bus ride to the NYC subway system. Although it is not required to enjoy your stay, having a car is helpful. The small beach is at the end of the block, 3 houses down. The house is NOT on the water. You can see the water from the windows in the living/bed room. City Island is a walkable Island. Everything is very close. It is helpful to have a car if you want to venture further off the Island. The city subway system is a bus ride away. The MTA bus service takes approximately 15 minutes from City Island to the Pelham Bay 6 train subway stop. The subway ride into Manhattan from there is approximately 1 hour. There is also a MTA Express bus into Manhattan that runs on a limited schedule from City Island Ave. stops to stops in Manhattan. Please consult the MTA website before booking or email me if you have questions. This is a whole private house on a charming quiet residential street. You will have use of the large fenced in back yard. Private driveway for one car. I live close by and know the area well if you have any issues or questions. Friendly and helpful neighbors. Quiet street. No Smoking https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=small https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=medium https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=large https://a0.muscache.com/im/pictures/abbac002-c8a5-44da-9405-4897e73667c6.jpg?aki_policy=x_large 403032 https://www.airbnb.com/users/show/403032 Diane 2011-02-21 New York, New York, United States NA within an hour 100% N/A TRUE https://a0.muscache.com/im/pictures/dc275952-e2d7-463e-93a5-994fa8c31e04.jpg?aki_policy=profile_small https://a0.muscache.com/im/pictures/dc275952-e2d7-463e-93a5-994fa8c31e04.jpg?aki_policy=profile_x_medium NA 1 1 [‘email’, ‘phone’, ‘facebook’, ‘reviews’] TRUE FALSE Bronx, NY 10464, United States NA City Island Bronx Bronx NY 10464 New York Bronx, NY US United States 40.84487 -73.78954 FALSE House Entire home/apt 2 1 0 1 Real Bed {TV,“Wireless Internet”,“Air conditioning”,Kitchen,“Free parking on premises”,Heating,“Family/kid friendly”,Washer,Dryer,“Smoke detector”,“Carbon monoxide detector”,“Safety card”,Essentials,Shampoo,“24-hour check-in”,Hangers,“Hair dryer”,Iron,“Laptop friendly workspace”,“Self Check-In”,Lockbox} NA $125.00 $550.00 NA $500.00 $35.00 1 $0.00 2 28 a week ago NA 23 33 51 129 2017-05-03 41 2015-12-26 2017-04-27 97 10 10 10 10 10 10 FALSE NA NA FALSE moderate FALSE FALSE 1 2.48
  • For Zillow - Cost Data,
zhvi <- imp("Zip_Zhvi_2bedroom.csv")
## 8946 262
head.cust(zhvi)
RegionID RegionName City State Metro CountyName SizeRank 1996-04 1996-05 1996-06 1996-07 1996-08 1996-09 1996-10 1996-11 1996-12 1997-01 1997-02 1997-03 1997-04 1997-05 1997-06 1997-07 1997-08 1997-09 1997-10 1997-11 1997-12 1998-01 1998-02 1998-03 1998-04 1998-05 1998-06 1998-07 1998-08 1998-09 1998-10 1998-11 1998-12 1999-01 1999-02 1999-03 1999-04 1999-05 1999-06 1999-07 1999-08 1999-09 1999-10 1999-11 1999-12 2000-01 2000-02 2000-03 2000-04 2000-05 2000-06 2000-07 2000-08 2000-09 2000-10 2000-11 2000-12 2001-01 2001-02 2001-03 2001-04 2001-05 2001-06 2001-07 2001-08 2001-09 2001-10 2001-11 2001-12 2002-01 2002-02 2002-03 2002-04 2002-05 2002-06 2002-07 2002-08 2002-09 2002-10 2002-11 2002-12 2003-01 2003-02 2003-03 2003-04 2003-05 2003-06 2003-07 2003-08 2003-09 2003-10 2003-11 2003-12 2004-01 2004-02 2004-03 2004-04 2004-05 2004-06 2004-07 2004-08 2004-09 2004-10 2004-11 2004-12 2005-01 2005-02 2005-03 2005-04 2005-05 2005-06 2005-07 2005-08 2005-09 2005-10 2005-11 2005-12 2006-01 2006-02 2006-03 2006-04 2006-05 2006-06 2006-07 2006-08 2006-09 2006-10 2006-11 2006-12 2007-01 2007-02 2007-03 2007-04 2007-05 2007-06 2007-07 2007-08 2007-09 2007-10 2007-11 2007-12 2008-01 2008-02 2008-03 2008-04 2008-05 2008-06 2008-07 2008-08 2008-09 2008-10 2008-11 2008-12 2009-01 2009-02 2009-03 2009-04 2009-05 2009-06 2009-07 2009-08 2009-09 2009-10 2009-11 2009-12 2010-01 2010-02 2010-03 2010-04 2010-05 2010-06 2010-07 2010-08 2010-09 2010-10 2010-11 2010-12 2011-01 2011-02 2011-03 2011-04 2011-05 2011-06 2011-07 2011-08 2011-09 2011-10 2011-11 2011-12 2012-01 2012-02 2012-03 2012-04 2012-05 2012-06 2012-07 2012-08 2012-09 2012-10 2012-11 2012-12 2013-01 2013-02 2013-03 2013-04 2013-05 2013-06 2013-07 2013-08 2013-09 2013-10 2013-11 2013-12 2014-01 2014-02 2014-03 2014-04 2014-05 2014-06 2014-07 2014-08 2014-09 2014-10 2014-11 2014-12 2015-01 2015-02 2015-03 2015-04 2015-05 2015-06 2015-07 2015-08 2015-09 2015-10 2015-11 2015-12 2016-01 2016-02 2016-03 2016-04 2016-05 2016-06 2016-07 2016-08 2016-09 2016-10 2016-11 2016-12 2017-01 2017-02 2017-03 2017-04 2017-05 2017-06
61639 10025 New York NY New York New York 1 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 798600 798800 801500 804600 814900 828300 835700 849300 858100 854000 834800 821700 830300 853700 868300 875200 882200 892400 905000 924000 934400 932100 927500 923600 907900 890900 883400 896100 923900 952900 964600 972500 973800 973400 966500 966800 967100 974800 976800 976100 973700 974500 973200 966400 950400 933300 920900 909400 891400 873300 858800 850200 842800 834000 828800 821400 813900 813300 821500 831700 845100 854500 858900 859200 863500 876000 886100 890000 894200 901800 909500 913300 907400 900000 897700 896300 892300 890400 888600 891700 899500 904400 908200 914000 915100 912300 914000 921100 923300 917300 915000 922800 929100 937700 955700 974200 995500 1019500 1035100 1054900 1079900 1092600 1103500 1118800 1139300 1154600 1144100 1120300 1125500 1136000 1135100 1130000 1138200 1153700 1174800 1185400 1188400 1189700 1193700 1199900 1201400 1202600 1214200 1235200 1258000 1287700 1307200 1313900 1317100 1327400 1338800 1350400 1356600 1358500 1364000 1373300 1382600 1374400 1364100 1366300 1354800 1327500 1317300 1333700 1352100 1390000 1431000
84654 60657 Chicago IL Chicago Cook 2 167700 166400 166700 167200 166900 166900 168000 170100 171700 173000 174600 177600 180100 182300 184400 186300 187600 189400 190300 189700 189800 191900 194500 195500 196000 196900 198900 201400 204600 207900 211800 214600 216000 217500 220200 222800 226200 229600 232400 234400 236300 238300 241800 246100 249500 251300 253200 255700 259200 263100 266600 269500 272800 275500 278800 283400 288600 291300 292400 294600 297100 298200 299800 302000 304200 307900 311000 311400 311000 311700 312300 312000 311800 312600 313000 314400 317300 319700 320500 321000 321600 323800 326100 329000 332200 334700 336000 337300 337500 337100 334900 333100 332800 333400 335100 337800 339400 340700 343200 345000 345200 344600 344500 346200 349800 353400 355100 356300 358300 359500 359200 358500 359100 361400 364700 367500 369400 369800 369600 368700 368100 369000 370300 371700 374200 375700 376400 378200 379800 380100 380400 381200 382700 382700 380200 377800 376300 375600 376000 376200 375800 376300 377200 376800 373700 370100 368700 368600 366600 362200 358600 355300 352300 350900 350100 347900 345400 343400 342800 342600 341100 339900 338900 338200 335200 329800 325500 323600 323400 325000 325800 323200 320100 318600 317400 315700 315000 315300 315600 313900 309800 305700 301800 299500 299900 301100 300300 298900 298500 298500 297000 296800 298700 299600 300700 303900 306800 307500 308500 310000 310800 311200 313000 315800 319000 323400 327500 330000 331800 334500 336000 335700 335400 336300 338800 342400 344400 344000 343900 345100 346100 346900 348000 349700 351200 351700 350700 350400 352000 354300 355900 356500 355200 353800 353700 354600 356200 357800 358200 358500 360300 362400 363700 365200 367100 368600 370200 372300 375300 378700 381400 381800 382100 383300 385100
61637 10023 New York NY New York New York 3 NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA 1526800 1424500 1346600 1331300 1322500 1289300 1265400 1249700 1241100 1232700 1225500 1228200 1252600 1266100 1288700 1308100 1333000 1356400 1362000 1353600 1364000 1373900 1389600 1401600 1404100 1415800 1432400 1455400 1474200 1462300 1438300 1435500 1427800 1411200 1407400 1419700 1457400 1500800 1524900 1537800 1558700 1586100 1602300 1621100 1639300 1657400 1657400 1656100 1649400 1643400 1632400 1618200 1588300 1543600 1500800 1464200 1426100 1387300 1362600 1351700 1344300 1331800 1334800 1314200 1271900 1252300 1262300 1279200 1309000 1335300 1353800 1366400 1372100 1381300 1385000 1388100 1399100 1399800 1389300 1384700 1380900 1367900 1365400 1375100 1380400 1377000 1375100 1379000 1395200 1414500 1419000 1403100 1383200 1376700 1378200 1378700 1375900 1366700 1365500 1382200 1404700 1428000 1445700 1452900 1460100 1484400 1508400 1522800 1538300 1568600 1597400 1622900 1654300 1684600 1713000 1728800 1736100 1745900 1753800 1736600 1730400 1734500 1728700 1720800 1717700 1700100 1680400 1676400 1685600 1708100 1730400 1751800 1778300 1810400 1831600 1844400 1861600 1889600 1901500 1895300 1890200 1898400 1924500 1967300 1993500 1980700 1960900 1951300 1937800 1929800 1955000 2022400 2095000 2142300
84616 60614 Chicago IL Chicago Cook 4 195800 193500 192600 192300 192600 193600 195500 197600 199400 201300 203600 206500 209200 211100 212600 214400 215600 216500 217900 220100 222200 223900 225400 227700 230100 231700 232700 233700 234700 235600 236800 238800 240800 242400 243800 246400 250200 254300 257600 261100 264800 267900 270700 272800 274400 276200 278600 280100 283100 287700 293600 298500 302700 305000 306800 309400 313100 314900 316200 318200 320600 322900 325500 328400 330700 332800 334400 335900 337400 339700 342300 343800 343400 342300 341800 341700 342400 344300 346900 348900 350200 351700 353500 355700 358000 361600 364000 365500 366400 367000 365200 363100 362000 362500 364800 368200 370800 371400 371200 371800 374200 377800 380100 381700 384400 386600 385700 385400 388500 392200 394500 394900 394800 395500 400000 404300 407600 409500 410400 408700 406400 403800 402400 402300 403000 403100 403600 404500 406100 407700 408700 409600 409800 408700 408600 411000 412200 411400 410200 408400 406600 406500 406400 404600 402900 400600 397500 392600 387500 383700 381000 378900 377700 377400 376700 374000 369600 366200 364500 364100 364300 363600 361900 361900 356400 347100 342400 344000 345200 346900 346100 342600 340100 339900 338600 335500 333900 335000 336000 334200 330300 327000 326000 326100 326700 326300 324400 322700 323200 322800 320700 319500 320100 320500 321800 323600 324300 324100 324700 326000 327600 329800 332600 336800 342300 348100 353600 358900 361900 363900 366200 368300 369800 371400 372400 373200 373800 374800 376200 376800 376300 374900 373800 373900 374700 375300 375000 374700 376300 378100 378000 377700 378300 380000 383100 385900 388100 389700 391800 393400 394700 394900 395700 396400 397500 398900 401200 403200 405700 408300 408800 408000 410100 412200 412200
93144 79936 El Paso TX El Paso El Paso 5 59100 60500 60900 60800 60300 60400 61200 61700 61000 60100 59300 59000 58700 58400 58000 57800 57900 57800 57800 58100 58400 58700 59200 59400 58700 58100 57900 57900 57700 57600 57500 57800 58000 58000 57900 57800 57800 58000 58500 58700 59000 59200 59300 59500 59900 60300 60400 60300 60300 60000 59500 59400 59800 59900 59700 59500 59400 59500 59700 59700 59200 58700 58400 57800 57000 56700 56800 56600 56500 56500 56600 56700 56600 56700 57000 57300 57300 57300 57100 56900 56900 57000 56700 56700 57000 57400 57700 58000 58300 58600 58800 59100 59500 60200 61100 61700 62400 63100 63600 63800 64400 64900 65100 65200 65300 65600 66300 67100 67900 68800 69600 70600 71500 72100 72600 73400 74200 74500 75000 75700 76400 77100 77700 78100 78700 79600 80600 81500 82300 83200 83900 84400 84300 84400 85200 86100 86800 87400 87700 87800 88300 88800 88500 87900 87500 86600 85500 84600 84200 83900 83600 83400 83500 84100 84600 84600 84700 84600 84600 85100 85500 85800 86600 87600 86500 84200 83300 83800 83800 83600 83600 83400 82900 82400 82200 82100 82300 82800 82900 82600 82700 83100 83300 83100 82800 82500 82300 82200 82300 82200 81900 81700 82100 82600 82900 83000 83000 82900 82100 81200 80800 80700 81200 81800 81800 81400 81400 81500 81900 82000 81900 81900 82100 82100 81500 80800 80300 80100 80100 80700 81200 81700 81900 81700 81500 81700 81700 80900 81000 81500 81400 80500 80000 80100 80500 80800 81400 82300 82600 82600 82500 82500 82600 82700 82600 82400 82300 82400 82300 82500 83200 83900 84100 83900 83700
84640 60640 Chicago IL Chicago Cook 6 123300 122600 122000 121500 120900 120600 120900 121300 121600 122100 122900 124200 125300 126100 126700 127900 129300 130400 131300 131700 132300 133500 134500 134800 135200 135500 136300 137700 139600 141600 143400 144500 145600 147400 149200 149900 150500 151700 152800 153900 156400 159400 161800 163800 165700 167100 168400 169600 171000 172400 174800 177900 180400 182300 184600 187700 190700 193100 196100 200000 202900 205500 207600 208600 209200 210000 210200 211300 213000 214100 215200 217600 220200 222400 224600 227000 228600 230100 232400 234300 235300 236200 237000 237900 239200 241300 243600 244200 243800 244500 245500 245700 246400 247500 248000 249000 251100 252900 253700 256100 259400 261300 262200 263800 265300 267600 270500 272800 273700 273600 273200 273300 273500 273800 274300 274400 274400 275300 276600 278200 280600 283100 284400 284200 283000 282000 282400 283700 285200 286700 286700 284900 283300 282700 282100 280800 280300 280200 280100 279900 281200 282300 282100 279300 275800 273100 272600 271800 270300 268600 266900 264800 262900 260700 258500 256600 255000 252900 250600 248900 248900 250100 250300 250400 248000 243800 240200 239700 239000 238200 236800 235400 233300 231300 230300 228200 225200 223100 222300 220700 218300 215900 213200 210900 209600 208200 205900 204000 203200 202500 200800 199400 199900 201900 204500 207000 208100 207100 205300 204700 204700 205800 208600 211800 213500 213800 215100 218400 221500 223900 226100 227900 229100 230200 230600 230400 230000 229000 227600 226100 225700 226200 226500 226500 227100 227800 229400 231800 234100 235400 235100 233900 233700 235300 237200 238500 239300 239600 239500 240200 242700 244900 247700 249500 248800 247000 247300 248700 250800 252800 253800 253800 253400 254100 255100

4.2 Joining & Filtering Criteria

Based on preliminary data exploration and task requirements, we have

4.2.1 Identification

  • Joining
    • Airbnb - zipcode
    • Zillow - RegionName
  • Filtering
    • Airbnb
      1. bedrooms = 2, given requirement
      2. minimum_nights < 7, to ensure short term stay
    • Zillow
      1. City = ‘New York’

4.2.2 Data Quality Check

  • First we check if we have duplicates in the data and what is the grain of information. Below code chunk tells us that there are no duplicates and that Listing data is at the level of Property Listings whereas Zhvi data is at the level of zipcodes
print("Duplicate rows in Listing Dataset")
## [1] "Duplicate rows in Listing Dataset"
nrow(listing)-nrow(unique(listing))
## [1] 0
print("Duplicate rows in Zhvi Dataset")
## [1] "Duplicate rows in Zhvi Dataset"
nrow(zhvi)-nrow(unique(zhvi))
## [1] 0
print("Duplicates in the primary key 'id' for Listing")
## [1] "Duplicates in the primary key 'id' for Listing"
nrow(listing) - length(unique(listing$id))
## [1] 0
print("Duplicates in the primary key 'RegionName' for Zhvi")
## [1] "Duplicates in the primary key 'RegionName' for Zhvi"
nrow(zhvi)-length(unique(zhvi$RegionName))
## [1] 0
  • Airbnb: Zipcode - Below code chunk tells us the number of missing values
listing$zipcode %>% 
  is.na() %>% 
  sum() %>% cat()
## 611

Since we know that Zipcode is a 5 digits code, we check for irregularities. We notice that there are values in this field which do not adhere to pattern

listing$zipcode %>% 
  nchar() %>% 
  table %>%
  kable(col.names = c("size","Frequency")) %>% 
  kable_styling(bootstrap_options = c("responsive","striped")) 
size Frequency
2 1
5 40135
10 5
11 1
  • Airbnb: bedrooms - Below code chunk tells us the number of missing values. There are some missing values
listing$bedrooms %>% 
  is.na() %>% 
  sum() %>% cat()
## 69

Since we know that Bedroom is numeric, we check for irregularities. We found none

listing$bedrooms %>% 
  sapply(function(x) is.numeric(x)) %>% sum() - nrow(listing) 
## [1] 0
  • Airbnb: Minimum Nights - Below code chunk tells us the number of missing values. There are no missing values
listing$minimum_nights %>% 
  is.na() %>% 
  sum() %>% cat()
## 0

We also check if values are non-numeric

listing$minimum_nights %>% 
  sapply(function(x) is.numeric(x)) %>% sum() - nrow(listing) 
## [1] 0
  • Zillow - RegionName Below code chunk tells us the number of missing values. We notice no missing values
zhvi$RegionName %>% 
  is.na() %>% 
  sum() %>% cat()
## 0

Since we know that Zipcode is a 5 digit code, we check for irregularities. We notice that there are no bad values

zhvi$RegionName %>% 
  nchar() %>% 
  table %>%
  kable(col.names = c("size","Frequency")) %>% 
  kable_styling(bootstrap_options = c("responsive","striped")) 
size Frequency
5 8946
  • Zillow - City Below code chunk tells us the number of missing values. We notice no missing values
zhvi$City %>% 
  is.na() %>% 
  sum() %>% cat()
## 0

We also check if values have different cases or not. Since we know the total number of values, a different number would indicate an issue. We found no issues with pattern but found multiple inconsistencies with the values on further exploration. For example New York is sometimes listed as ‘NY’, ‘new york new york’, etc.

nrow(zhvi) - sum(tolower(zhvi$City)==zhvi$City)
## [1] 8946
  • Testing the join Since we know that for some aspect of our analysis, we will have to join the data sets, we test if there are zipcodes which are present in Zillow & not in Airbnb. Doing it the other way round will be very difficult because there are many names by which the New York City is represented. Indeed, we observe that not all Zillow zipcodes are present in Airbnb data
listing.zip <- listing %>% 
  filter(bedrooms==2 & minimum_nights < 7 ) %>%
  select(zipcode) %>%  unique() %>%
  sapply(function(x) substring(x,1,5))  %>% as.data.frame

zhvi.zip <- zhvi %>%
  filter(trimws(tolower(City)) == 'new york')  %>%
  select(RegionName) %>%  unique() %>% as.data.frame

print(paste("Number of distinct zipcodes  based on bedrooms and minimum nights in listing",nrow(listing.zip)-sum(is.na(listing.zip)) ))
## [1] "Number of distinct zipcodes  based on bedrooms and minimum nights in listing 151"
print(paste("Number of distinct zipcodes in zhvi",nrow(zhvi.zip) ))
## [1] "Number of distinct zipcodes in zhvi 25"
names(listing.zip) <- "x" # assigning same name to both vectors for intersect()
names(zhvi.zip) <- "x" 

print(paste("matching zipcodes in both",
            suppressWarnings(intersect(listing.zip,zhvi.zip)) %>% nrow() )) 
## [1] "matching zipcodes in both 22"

4.2.3 Data Pre-processing

Key takeaways from previous step which will be the basis for this step are -

  • subset first 5 characters for zipcodes in Airbnb and remove missing or other irregular values
  • Filter both datasets based on pre-defined requirement to reduce volume
listing$zipcode <- substring(listing$zipcode,1,5) # regularize zip in Listing
zhvi$City <- trimws(tolower(zhvi$City))  # regularize city  in zhvi

air_sub <- listing %>% select(names(.)) %>% 
  filter( bedrooms == 2 & # this takes care of missing values as well
          zipcode %in% zhvi.zip$x  # this takes care of missing values as well
        )

zil_sub <- zhvi %>% select(names(.)) %>% 
  filter( City == "new york") 

print(paste("Dimensions of cleaned Listing subset:",
             "Rows -",dim(air_sub)[1],"Columns -",dim(air_sub)[2]))  
## [1] "Dimensions of cleaned Listing subset: Rows - 1239 Columns - 95"
print(paste("Dimensions of cleaned zhvi subset:",
             "Rows -",dim(zil_sub)[1],"Columns -",dim(zil_sub)[2]))  
## [1] "Dimensions of cleaned zhvi subset: Rows - 25 Columns - 262"

4.3 Identifying Relevant Factors & Data Quality Check

In the previous steps, we have arrived at the subset of datasets for listing & zhvi. In this section, we will be talking about the Important factors(variables) to retain from each dataset

  • Airbnb - Listing - Revenue Data:
    • neighbourhood_group_cleansed - Name of the area where the property is located
    • zipcode - Zip code where the property is located
    • bedrooms - Indicates the number of bedrooms within the property
    • square_feet - Square footage of the property or space for rent
    • price - Price the host is charging to stay per night
    • minimum_nights - Minimum nights the host is willing to rent out the property
    • availability_30 - Number of days the property is available for rent within 30 days
    • review_scores_rating - Overall score given based on accuracy, cleanliness, check-in, communication, location, and value
  • Missing Values in Listing Subset - Here we notice that only ‘review_scores_rating’ and ‘square_feet’ have missing values. However, ‘square_feet’ has too many missing values and will be removed from final subset in subsequent steps
imp.var.air <- c("neighbourhood_group_cleansed","square_feet","price",
                 "minimum_nights","availability_30","review_scores_rating")

air_sub %>% select(imp.var.air) %>% 
  sapply(function(x) sum(is.na(x))) %>% kable() %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
x
neighbourhood_group_cleansed 0
square_feet 1210
price 0
minimum_nights 0
availability_30 0
review_scores_rating 268
  • Bad Values in Listing Subset - Here we notice that we will have to transform Price into numeric variable by removing ‘$’ & ‘,’ . Rest all variables look good enough
for(i in 1:length(imp.var.air)) {
  cat('\n')
  cat(imp.var.air[i])
  cat('\n')
  air_sub[,imp.var.air[i]] %>%
  table() %>%
  as.data.frame() %>% 
  arrange(desc(Freq)) %>%
  kable() %>%  
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px") %>% print()
  cat('\n')
}
neighbourhood_group_cleansed
. Freq
Manhattan 841
Brooklyn 374
Staten Island 15
Queens 9
square_feet
. Freq
1000 5
1100 4
900 3
0 2
800 2
850 2
1200 2
1500 2
208 1
600 1
700 1
797 1
950 1
1150 1
1600 1
price
. Freq
$250.00 93
$200.00 72
$300.00 59
$150.00 56
$350.00 42
$400.00 32
$175.00 31
$275.00 26
$450.00 24
$325.00 22
$299.00 21
$220.00 20
$160.00 19
$225.00 19
$120.00 18
$180.00 18
$100.00 17
$199.00 17
$140.00 16
$500.00 16
$125.00 14
$195.00 13
$170.00 12
$190.00 12
$249.00 12
$165.00 11
$290.00 11
$375.00 11
$149.00 10
$240.00 10
$600.00 10
$115.00 9
$185.00 9
$425.00 9
$499.00 9
$130.00 8
$135.00 8
$245.00 8
$550.00 8
$70.00 8
$110.00 7
$129.00 7
$145.00 7
$169.00 7
$189.00 7
$230.00 7
$1,000.00 6
$155.00 6
$239.00 6
$265.00 6
$395.00 6
$99.00 6
$210.00 5
$215.00 5
$219.00 5
$295.00 5
$75.00 5
$85.00 5
$90.00 5
$148.00 4
$205.00 4
$279.00 4
$280.00 4
$285.00 4
$349.00 4
$399.00 4
$449.00 4
$495.00 4
$590.00 4
$65.00 4
$650.00 4
$1,500.00 3
$139.00 3
$198.00 3
$218.00 3
$229.00 3
$235.00 3
$255.00 3
$260.00 3
$270.00 3
$271.00 3
$305.00 3
$315.00 3
$320.00 3
$345.00 3
$360.00 3
$365.00 3
$385.00 3
$410.00 3
$480.00 3
$490.00 3
$525.00 3
$599.00 3
$700.00 3
$79.00 3
$80.00 3
$89.00 3
$95.00 3
$1,200.00 2
$105.00 2
$119.00 2
$142.00 2
$154.00 2
$159.00 2
$161.00 2
$162.00 2
$224.00 2
$226.00 2
$227.00 2
$251.00 2
$258.00 2
$269.00 2
$289.00 2
$297.00 2
$321.00 2
$329.00 2
$355.00 2
$389.00 2
$390.00 2
$555.00 2
$60.00 2
$800.00 2
$88.00 2
$1,600.00 1
$1,650.00 1
$104.00 1
$107.00 1
$111.00 1
$113.00 1
$116.00 1
$123.00 1
$124.00 1
$131.00 1
$133.00 1
$134.00 1
$136.00 1
$137.00 1
$138.00 1
$146.00 1
$151.00 1
$157.00 1
$158.00 1
$164.00 1
$168.00 1
$179.00 1
$181.00 1
$183.00 1
$184.00 1
$187.00 1
$188.00 1
$193.00 1
$197.00 1
$2,000.00 1
$204.00 1
$206.00 1
$207.00 1
$208.00 1
$209.00 1
$213.00 1
$222.00 1
$259.00 1
$267.00 1
$272.00 1
$273.00 1
$276.00 1
$28.00 1
$293.00 1
$298.00 1
$3,750.00 1
$301.00 1
$309.00 1
$318.00 1
$319.00 1
$322.00 1
$324.00 1
$330.00 1
$335.00 1
$338.00 1
$339.00 1
$340.00 1
$351.00 1
$356.00 1
$357.00 1
$359.00 1
$369.00 1
$372.00 1
$374.00 1
$379.00 1
$380.00 1
$398.00 1
$4,700.00 1
$40.00 1
$420.00 1
$430.00 1
$433.00 1
$440.00 1
$445.00 1
$455.00 1
$459.00 1
$465.00 1
$475.00 1
$485.00 1
$492.00 1
$496.00 1
$50.00 1
$515.00 1
$530.00 1
$545.00 1
$55.00 1
$575.00 1
$58.00 1
$595.00 1
$67.00 1
$68.00 1
$685.00 1
$691.00 1
$699.00 1
$71.00 1
$750.00 1
$760.00 1
$775.00 1
$789.00 1
$82.00 1
$877.00 1
$878.00 1
$900.00 1
$950.00 1
$975.00 1
$98.00 1
$989.00 1
minimum_nights
. Freq
2 325
1 309
3 252
4 101
5 81
7 65
30 32
6 24
10 9
14 9
21 6
8 5
15 5
31 4
29 2
12 1
13 1
20 1
25 1
27 1
28 1
60 1
88 1
90 1
120 1
availability_30
. Freq
0 575
1 57
4 54
2 46
5 45
3 43
9 36
30 35
7 34
6 31
11 30
8 28
10 26
12 24
13 17
16 17
17 17
23 17
29 17
14 15
15 13
21 10
22 10
28 10
18 8
20 7
19 6
24 6
26 3
25 1
27 1
review_scores_rating
. Freq
100 277
90 67
95 65
93 62
97 57
96 53
80 48
98 45
94 44
91 32
92 31
88 27
89 25
87 24
99 22
85 14
83 13
86 12
60 10
84 9
81 7
78 6
82 5
40 3
70 2
77 2
79 2
20 1
63 1
64 1
72 1
74 1
75 1
76 1
  • Zillow - zhvi - Cost Data:
    • RegionName Zip code of where the property is located
    • City - City of where the property is located
    • SizeRank - Population of the area; the lower the number the greater the population
    • Median Price columns - To decide which ones to pick, we will check how many missing values they have and pick a set of columns which is continous and has minimum missing values. We observe that there are no missing values from 2007-06 to 2017-06 and these are the columns which we will be using for further analysis
imp.var.zil <- c("RegionID","RegionName","City","State","Metro","CountyName",
                 "SizeRank")
zil_sub %>% select(-imp.var.zil) %>% 
  sapply(function(x) sum(is.na(x))) %>% kable() %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
x
1996-04 17
1996-05 17
1996-06 17
1996-07 17
1996-08 17
1996-09 17
1996-10 17
1996-11 17
1996-12 17
1997-01 17
1997-02 17
1997-03 17
1997-04 17
1997-05 17
1997-06 17
1997-07 17
1997-08 17
1997-09 17
1997-10 17
1997-11 17
1997-12 17
1998-01 17
1998-02 17
1998-03 17
1998-04 17
1998-05 17
1998-06 17
1998-07 17
1998-08 17
1998-09 17
1998-10 16
1998-11 16
1998-12 16
1999-01 16
1999-02 16
1999-03 16
1999-04 16
1999-05 16
1999-06 16
1999-07 16
1999-08 16
1999-09 16
1999-10 16
1999-11 16
1999-12 16
2000-01 16
2000-02 16
2000-03 16
2000-04 16
2000-05 16
2000-06 16
2000-07 16
2000-08 16
2000-09 16
2000-10 16
2000-11 16
2000-12 16
2001-01 16
2001-02 16
2001-03 16
2001-04 16
2001-05 16
2001-06 16
2001-07 16
2001-08 16
2001-09 16
2001-10 16
2001-11 16
2001-12 16
2002-01 16
2002-02 17
2002-03 17
2002-04 17
2002-05 17
2002-06 17
2002-07 17
2002-08 17
2002-09 17
2002-10 17
2002-11 17
2002-12 17
2003-01 17
2003-02 17
2003-03 17
2003-04 17
2003-05 17
2003-06 17
2003-07 17
2003-08 17
2003-09 17
2003-10 17
2003-11 16
2003-12 14
2004-01 13
2004-02 13
2004-03 12
2004-04 12
2004-05 12
2004-06 10
2004-07 10
2004-08 10
2004-09 5
2004-10 4
2004-11 4
2004-12 4
2005-01 4
2005-02 4
2005-03 4
2005-04 3
2005-05 3
2005-06 3
2005-07 3
2005-08 3
2005-09 3
2005-10 1
2005-11 1
2005-12 1
2006-01 1
2006-02 1
2006-03 1
2006-04 1
2006-05 1
2006-06 1
2006-07 1
2006-08 1
2006-09 1
2006-10 1
2006-11 1
2006-12 1
2007-01 1
2007-02 1
2007-03 1
2007-04 1
2007-05 1
2007-06 0
2007-07 0
2007-08 0
2007-09 0
2007-10 0
2007-11 0
2007-12 0
2008-01 0
2008-02 0
2008-03 0
2008-04 0
2008-05 0
2008-06 0
2008-07 0
2008-08 0
2008-09 0
2008-10 0
2008-11 0
2008-12 0
2009-01 0
2009-02 0
2009-03 0
2009-04 0
2009-05 0
2009-06 0
2009-07 0
2009-08 0
2009-09 0
2009-10 0
2009-11 0
2009-12 0
2010-01 0
2010-02 0
2010-03 0
2010-04 0
2010-05 0
2010-06 0
2010-07 0
2010-08 0
2010-09 0
2010-10 0
2010-11 0
2010-12 0
2011-01 0
2011-02 0
2011-03 0
2011-04 0
2011-05 0
2011-06 0
2011-07 0
2011-08 0
2011-09 0
2011-10 0
2011-11 0
2011-12 0
2012-01 0
2012-02 0
2012-03 0
2012-04 0
2012-05 0
2012-06 0
2012-07 0
2012-08 0
2012-09 0
2012-10 0
2012-11 0
2012-12 0
2013-01 0
2013-02 0
2013-03 0
2013-04 0
2013-05 0
2013-06 0
2013-07 0
2013-08 0
2013-09 0
2013-10 0
2013-11 0
2013-12 0
2014-01 0
2014-02 0
2014-03 0
2014-04 0
2014-05 0
2014-06 0
2014-07 0
2014-08 0
2014-09 0
2014-10 0
2014-11 0
2014-12 0
2015-01 0
2015-02 0
2015-03 0
2015-04 0
2015-05 0
2015-06 0
2015-07 0
2015-08 0
2015-09 0
2015-10 0
2015-11 0
2015-12 0
2016-01 0
2016-02 0
2016-03 0
2016-04 0
2016-05 0
2016-06 0
2016-07 0
2016-08 0
2016-09 0
2016-10 0
2016-11 0
2016-12 0
2017-01 0
2017-02 0
2017-03 0
2017-04 0
2017-05 0
2017-06 0
  • Missing Values in Zillow Subset - There seems to be no problem with missing values in this case.
imp.var.zil <- c("RegionName","City","SizeRank")
zil_sub %>% select(imp.var.zil) %>% 
  sapply(function(x) sum(is.na(x))) %>% kable() %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
x
RegionName 0
City 0
SizeRank 0
  • Bad Values in Zillow Subset - There seems to be no problem of bad values in the subset. We have also checked that there are no negative prices in the data

4.4 Finalized Datasets

We have 3 datasets that will be used in the subsequent steps.

  • Air Subset - A subset of Listing Revenue data. Below code chunk introduces a function ‘subset_air’ which requires a dataframe, number of bedrooms, list of relevant zipcodes & min number of nights per stay to create a subset of the Listing dataset which can be directly used in future analyses when we have new/enriched data
subset_air <- function(df,bedroom,ziplist,min_night) {
df$zipcode <- substring(df$zipcode,1,5) # regularize zip in listing
df$price <- df$price %>% # regularize price  in listing
            sapply(function(x) gsub('\\,', '',x)) %>% #remove ,
            sapply(function(x) gsub('\\$','',x) ) %>%  #remove $
            as.numeric() #convert to number
df$review_scores_rating %<>%  replace(.,is.na(.)==1,0) #NA handling rvw scr rt
df$minimum_nights %<>%  replace(.,is.na(.)==1,0) #NA handling min night
df$availability_30 %<>%  replace(.,is.na(.)==1,0) #NA handling availability_30

imp.var.air <- c("neighbourhood_group_cleansed","bedrooms","price",
                "minimum_nights","availability_30","review_scores_rating",
                 "zipcode","city")
x <- df %>% select(imp.var.air)  %>% 
  filter( bedrooms == bedroom & # this takes care of missing values as well
          zipcode %in% ziplist & #remove unwanted zipcodes
          minimum_nights < min_night & # ensure short term stay
          replace(price,is.na(price),0) > 0  
        ) 
return(x)
    
  }
air_sub <- subset_air(listing,2,zhvi.zip$x,7)
print(paste("Dimensions of cleaned subset of Listing :",
             "Rows -",dim(air_sub)[1],"Columns -",dim(air_sub)[2])) 
## [1] "Dimensions of cleaned subset of Listing : Rows - 1092 Columns - 8"
head.cust(air_sub)
neighbourhood_group_cleansed bedrooms price minimum_nights availability_30 review_scores_rating zipcode city
Staten Island 2 110 2 1 0 10305 Staten Island
Staten Island 2 150 2 0 0 10305 Staten Island
Brooklyn 2 125 1 0 94 11201 Brooklyn
Brooklyn 2 250 3 0 96 11217 Brooklyn
Brooklyn 2 180 2 0 100 11201 Brooklyn
Brooklyn 2 250 3 0 0 11217 Brooklyn
  • Zil Subset - A subset of Zillow Pricing data. Similarly, to create a subset of Zillow data, we introduce the function ‘subset_zil’. Here we unpivot the data i.e. for analysis purposes, we convert YYYY_MM columns in rows using melt() function
subset_zil <- function(df,cit) {

df$RegionName <- substring(df$RegionName,1,5) # regularize zip in zhvi
df$City <- trimws(tolower(df$City))  # regularize city  in zhvi
df$SizeRank %<>%  replace(.,is.na(.)==1,0) #NA handling rvw scr rt
imp.var.zil <- c("RegionName","City","SizeRank")
yyyymm <- c("2007-06","2017-06") # define period for fetching median pricing

f.s <- which(names(df)==yyyymm[1])
f.e <- which(names(df)==yyyymm[2])

x <- subset(df,select = c(2,3,7,f.s:f.e))


x <- x %>% select(names(.))  %>% 
  filter( City == cit ) %>% as.data.frame()
# this takes care of missing values as well 

x <- melt(x,id=c("RegionName","City","SizeRank")) # unpivot columns for analysis
names(x) <- c("Zipcode","City","SizeRank","YYYY_MM","Med_Price")
return(x)
}
zil_sub <- subset_zil(zhvi,'new york')
print(paste("Dimensions of cleaned subset of zhvi:",
             "Rows -",dim(zil_sub)[1],"Columns -",dim(zil_sub)[2])) 
## [1] "Dimensions of cleaned subset of zhvi: Rows - 3025 Columns - 5"
head.cust(zil_sub)
Zipcode City SizeRank YYYY_MM Med_Price
10025 new york 1 2007-06 883400
10023 new york 3 2007-06 1411200
10128 new york 14 2007-06 1020500
10011 new york 15 2007-06 1514600
10003 new york 21 2007-06 1367700
11201 new york 32 2007-06 600400
  • Combined dataset - The joined data set is created through the function subset_com which takes Listing Subset, Joining variable in Listing, Zillow Subset & Joining variable in Zillow, respectively. The output is a combined dataset with all the columns in both subsets. Additionally, we have introduced two keys in this dataset because the grain of data is different in both subsets. Listing Subset is at Property level whereas Zillow Subset is Zipcode, YYYYMM level. If we do not include the keys, it may become difficult to subset the combined dataset for different analyses. This will happen because of the many-to-many join since the joining criterion is Zipcode and both the subsets are at a lower grain/level than that
subset_com <- function(x1,x2,y1,y2) {
 x1$id <- seq(1,nrow(x1),1)
  y1$id <- seq(1,nrow(y1),1)
  names(y1) <- gsub(y2, x2, names(y1))
 # y1 <- rename(y1, c(paste0(y2) = paste0(x2)))
 # z <- merge(x1, y1 ,by.x  = x2, by.y = y2, all.x ) %>% as.data.frame()
  z <- inner_join(x1, y1, by = x2, copy = T, keep = T,
             suffix = c(".air", ".zil"))
  return(z)
}
joined <- subset_com(air_sub,"zipcode",zil_sub,"Zipcode")

print(paste("Dimensions of cleaned subset of zhvi:",
             "Rows -",dim(joined)[1],"Columns -",dim(joined)[2])) 
## [1] "Dimensions of cleaned subset of zhvi: Rows - 132132 Columns - 14"
head.cust(joined)
neighbourhood_group_cleansed bedrooms price minimum_nights availability_30 review_scores_rating zipcode city id.air City SizeRank YYYY_MM Med_Price id.zil
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-06 347900 22
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-07 346200 47
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-08 345600 72
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-09 345400 97
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-10 346500 122
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2007-11 345700 147

5 Exploratory Data Analysis

I sliced and diced the data to observe trends based on different attributes

plot0 <-  joined %>%
          mutate(Year = substring(YYYY_MM,1,4)) %>% 
          select(Year,neighbourhood_group_cleansed,Med_Price) %>% unique() %>% 
          group_by(Year,neighbourhood_group_cleansed) %>%
          summarize(price = mean(Med_Price)) %>% 
                        ggplot(aes(x = Year,
                                 y = price,
                                 color = neighbourhood_group_cleansed
                                ,group = neighbourhood_group_cleansed
)) + geom_point(size = 1.5, alpha = 0.9) +
  geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name ="Average Median Price", labels = scales::comma) +
  scale_x_discrete(name ="Years") +
  scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot0 , width = 1200,height = 600)
#generating custom labels for the graph below
m <- unique(zil_sub$YYYY_MM) %>% as.character()
n <- rep(' ',length(m))
i <- 1 ; for (j in 1:11) {n[i] <- m[i]; i <- i + 12;}  

#actual code for the graph
plot1 <- zil_sub %>%  ggplot(aes(x = YYYY_MM,
                                 y = Med_Price,
                                 color = Zipcode,
                                 group = Zipcode
)) + geom_point(size = 0.5, alpha = 0.6) + geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name ="Median Price", labels = scales::comma) +
  scale_x_discrete(name ="Years - Month", 
              labels = n) 

ggplotly(plot1,width = 1200,height = 600)
plot2 <-  zil_sub %>%
          mutate(Year = substring(YYYY_MM,1,4)) %>% 
          select(Year,Zipcode,Med_Price) %>% 
          group_by(Year,Zipcode) %>%
          summarize(price = mean(Med_Price)) %>% 
                        ggplot(aes(x = Year,
                                 y = price,
                                 color = Zipcode
                                ,group = Zipcode
)) + geom_point(size = 1.5, alpha = 0.9) +
  geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name ="Average Median Price", labels = scales::comma) +
  scale_x_discrete(name ="Years")

ggplotly(plot2 , width = 1200,height = 600)
plot3 <-  zil_sub %>%
          select(YYYY_MM,Zipcode,Med_Price,SizeRank) %>% 
                                 ggplot(aes(x = SizeRank,
                                 y = Med_Price,
                                 color = Zipcode
                               #  ,group = Zipcode
)) + geom_point(size = 2, alpha = 1,aes(frame = YYYY_MM)) +
 # geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name = "Median Price", labels = scales::comma) +
  scale_x_continuous(name = "Size Rank") 
 
ggplotly(plot3, width = 1200,height = 600)
plot4 <-  air_sub %>%
          select(names(.)) %>% 
                             ggplot(aes(x = price,
                                 fill = neighbourhood_group_cleansed
)) + geom_density( alpha = 0.4) +
 # geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name = "Density") +
  scale_x_continuous(name = "Price per Night",
                     limits = quantile(air_sub$price, c(0, 0.99))) +
  scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot4, width = 1200,height = 600)
plot5 <-  air_sub %>%
          select(names(.)) %>% 
                             ggplot(aes(x = zipcode,
                                        y = price,
                                        fill = neighbourhood_group_cleansed
)) + geom_boxplot() +

 scale_y_continuous(name = "price Per Night",
                    limits = quantile(air_sub$price, c(0, 0.99))) +
 scale_fill_discrete(name = "Neighbourhood") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplotly(plot5,width = 1200, height = 600) 
plot6 <-  air_sub %>%
          select(names(.)) %>%  
                             ggplot(aes(x = review_scores_rating,
                                        y = price,
                                 fill = neighbourhood_group_cleansed
                                 
)) + geom_point( alpha = 0.9, aes(frame = neighbourhood_group_cleansed)) +
 # geom_line(size = 0.5, alpha = 0.6) +
 scale_y_continuous(name = "Price per Night",
                    limits = quantile(air_sub$price, c(0, 0.99))) +
  scale_x_continuous(name = "Reivew Score") +
  scale_fill_discrete(name = "Neighbourhood") +
   theme(legend.position = "none") 

ggplotly(plot6,width = 1200, height = 600)
plot7 <-  air_sub %>%
          select(names(.)) %>% filter(review_scores_rating != 0) %>% 
                             ggplot(aes(x = zipcode,
                                        y = review_scores_rating,
                                        fill = neighbourhood_group_cleansed
)) + geom_boxplot() +

 scale_y_continuous(name = "Review Scores",
               limits = quantile(air_sub$review_scores_rating, c(0, 0.99))) +
 scale_fill_discrete(name = "Neighbourhood") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplotly(plot7,width = 1200, height = 600) 
plot8 <-  air_sub %>%
          select(names(.)) %>% 
                             ggplot(aes(x = zipcode,
                                        fill = neighbourhood_group_cleansed,
                                        group = neighbourhood_group_cleansed
)) + geom_bar() +

 scale_y_continuous(name = "Count of Properties") +
 scale_fill_discrete(name = "Neighbourhood") +
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

ggplotly(plot8,width = 1200, height = 600) 
plot9 <-  air_sub %>%
          select(names(.)) %>% 
                             ggplot(aes(x = 30-availability_30,
                                 fill = neighbourhood_group_cleansed
)) + geom_density( alpha = 0.4) +
 # geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name = "Density") +
  scale_x_continuous(name = "Occupancy within 30 Days") +
  scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot9, width = 1200,height = 600)
plot10 <-  air_sub %>%
          mutate(Occupancy = 30-availability_30) %>% 
          select(Occupancy,zipcode,neighbourhood_group_cleansed) %>% 
          group_by(neighbourhood_group_cleansed,zipcode) %>%
          summarize(Med_Occ = median(Occupancy)) %>% 
                        ggplot(aes(x = zipcode,
                                 y = Med_Occ,
                                 color = neighbourhood_group_cleansed
                                ,fill = neighbourhood_group_cleansed
)) + geom_histogram(stat = "identity") +
 # geom_line(size = 0.5, alpha = 0.6) +
  scale_y_continuous(name = "Median Ocupancy") +
  scale_x_discrete(name = "Zipcode") +
  scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot10 , width = 1200,height = 600)

6 Business Definitions for Custom Attributes and Metrics

The whole EDA exercise has given us very good insights into the data and highlighted potential factors which can contribute to our decision making. Following are the Metrics that we created for assessing investment opportunities. One important thing here is that we will just be considering the Median Prices for ‘2017-06’ as our Cost of Properties. This is assumes that the investment will be immediate -

Range of Review Score Occupancy Rate(%)
Between 0 & 65 55
Between 66 & 75 65
Between 76 & 85 75
Between 86 & 95 85
Between 96 & 100 95

Below code chunk calculates the above discussed metrics and adds them to the final joined data set. Since Average Break Even Period is not at the same level as other Metrics, it will be calculated separately in the next section. The function ‘Metric_calc’ does the calculation for us and merges the new variables to the dataset which was given to it as input. The customized Occupancy Rate is calculated through the function ‘Occ_rate’. The final results are stored in ‘results’.

Occ_rate <- function(y) { 
 if(y >= 0 & y <=   65)     {return(0.55)} else
 if(y >= 66 & y <=  75)     {return(0.65)} else
 if(y >= 76 & y <=  85)     {return(0.75)} else
 if(y >= 86 & y <=  95)     {return(0.85)} else
 if(y >= 96 & y <=  100)    {return(0.95)} else {return(0.01)}
}

Metric_calc <- function(x) {
  x %<>% filter(YYYY_MM == '2017-06')
  x$OO <- 30 - x$availability_30 + 1
  x$OR <- sapply(x$review_scores_rating, function(x) Occ_rate(x) )
  x$MRO <- x$OR * x$price  * x$OO  
  x$BEP <- x$Med_Price/x$MRO
  x$BEP_Rank <- rank(x$BEP,ties.method = c("random")) 
  return(x)
}
results <- Metric_calc(joined)
head.cust(results)
neighbourhood_group_cleansed bedrooms price minimum_nights availability_30 review_scores_rating zipcode city id.air City SizeRank YYYY_MM Med_Price id.zil OO OR MRO BEP BEP_Rank
Staten Island 2 110 2 1 0 10305 Staten Island 1 new york 2087 2017-06 425100 3022 30 0.55 1815.00 234.2149 221
Staten Island 2 150 2 0 0 10305 Staten Island 2 new york 2087 2017-06 425100 3022 31 0.55 2557.50 166.2170 70
Brooklyn 2 125 1 0 94 11201 Brooklyn 3 new york 32 2017-06 1420700 3006 31 0.85 3293.75 431.3321 693
Brooklyn 2 250 3 0 96 11217 Brooklyn 4 new york 1555 2017-06 1302300 3018 31 0.95 7362.50 176.8829 99
Brooklyn 2 180 2 0 100 11201 Brooklyn 5 new york 32 2017-06 1420700 3006 31 0.95 5301.00 268.0060 331
Brooklyn 2 250 3 0 0 11217 Brooklyn 6 new york 1555 2017-06 1302300 3018 31 0.55 4262.50 305.5249 446

7 Break Even Period Analysis

We will calculate the Average Break Even Period For Zipcodes and Negihbourhoods. Lesser the number, better the investment. So based on increasing order of Average Break Even Period, we will rank the Zipcodes. A lower rank means a better investment. Below graph helps us visualize our results

plot11 <-  results %>%
                               ggplot(aes(x = zipcode,
                                 y = BEP,
                                 color = neighbourhood_group_cleansed
                                ,fill = neighbourhood_group_cleansed
)) + geom_boxplot() +
     scale_y_continuous(name = "Break Even Period ",
            limits = quantile(results$BEP, c(0, 0.95))) +
     scale_x_discrete(name = "Zipcode") +
     scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot11 , width = 1200,height = 600)
plot12 <-  results %>%
                               ggplot(aes(x = zipcode,
                                 y = BEP_Rank,
                                 color = neighbourhood_group_cleansed
                                ,fill = neighbourhood_group_cleansed
)) + geom_boxplot() +
 # geom_line(size = 0.5, alpha = 0.6) +
 scale_y_continuous(name = "Break Even Period Rank") +
  scale_x_discrete(name = "Zipcode") +
  scale_fill_discrete(name = "Neighbourhood")

ggplotly(plot12 , width = 1200,height = 600)
results %>% filter(BEP_Rank < 11) %>%
  select(zipcode,neighbourhood_group_cleansed,BEP_Rank,BEP) %>%
  arrange(BEP_Rank) %>%   kable()  %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
zipcode neighbourhood_group_cleansed BEP_Rank BEP
10003 Manhattan 1 33.57967
10025 Manhattan 2 55.95308
10022 Manhattan 3 59.57771
10021 Manhattan 4 66.55425
10025 Manhattan 5 72.07253
11215 Brooklyn 6 81.32452
10025 Manhattan 7 83.92962
10025 Manhattan 8 93.25513
10014 Manhattan 9 97.42326
10036 Manhattan 10 103.03933
 temp <- results %>%
  select(zipcode,neighbourhood_group_cleansed,BEP,BEP_Rank) %>% 
  group_by(neighbourhood_group_cleansed,zipcode) %>% 
  summarize(Count = n(),Avg_BEP = mean(BEP),Med_BEP = median(BEP)) %>% 
    arrange(Med_BEP )

temp$Rank <- seq(1,nrow(temp),1) 
temp %>%   kable()  %>%
  kable_styling(bootstrap_options = c("responsive","striped")) %>%
  scroll_box(width = "100%", height = "200px")
neighbourhood_group_cleansed zipcode Count Avg_BEP Med_BEP Rank
Staten Island 10304 2 155.4434 155.4434 1
Staten Island 10305 5 196.0385 169.8192 2
Brooklyn 11215 118 353.3489 268.2114 3
Brooklyn 11231 64 405.6611 281.6931 4
Manhattan 10025 93 444.4788 283.4225 5
Brooklyn 11217 75 313.8556 288.2311 6
Staten Island 10306 1 291.1236 291.1236 7
Staten Island 10308 1 301.4354 301.4354 8
Manhattan 10036 100 626.0552 303.2394 9
Brooklyn 11201 56 384.1950 326.7442 10
Manhattan 10021 18 494.1516 331.5212 11
Manhattan 10022 36 558.3088 342.0790 12
Manhattan 10023 52 559.1872 357.9757 13
Manhattan 10011 95 1091.6234 380.7068 14
Manhattan 10128 51 983.0077 407.3162 15
Manhattan 10014 86 720.6680 420.2752 16
Manhattan 10003 121 675.4000 452.0000 17
Manhattan 10028 28 896.2868 516.2119 18
Queens 11434 9 1589.6021 534.6853 19
Manhattan 10013 75 1311.5729 650.2941 20
Brooklyn 11234 5 2974.9602 2018.1972 21
Staten Island 10312 1 3002.1142 3002.1142 22

8 Recommendation

9 Future Scope

10 Assumptions & Considerations

END OF DOCUMENT